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Abstract 

Originally, quantum probability theory was developed to an- 
alyze statistical phenomena in quantum systems, where clas- 
sical probability theory does not apply, because the lattice of 
measurable sets is not necessarily distributive. On the other 
hand, it is well known that the lattices of concepts, that arise 
in data analysis, are in general also non-distributive, albeit 
for completely different reasons. In his recent book, van Ri- 
jsbergen (20041 argues that many of the logical tools devel- 
oped for quantum systems are also suitable for applications 
in information retrieval. I explore the mathematical support 
for this idea on an abstract vector space model, covering sev- 
eral forms of data analysis (information retrieval, data min- 
ing, collaborative filtering, formal concept analysis. . . ), and 
roughly based on an idea from categorical quantum mechan- 



ics ( |Abramsky & Coecke 2004|ICoecke & Pavlovic 2007t . It 
turns out that quantum (i.e., noncommutative) probability 
distributions arise already in this rudimentary mathematical 
framework. We show that a Bell-type inequality ( Bel l 1964t 
must be satisfied by the standard similarity measures, if they 
are used for preference predictions. The fact that already a 
very general, abstract version of the vector space model yields 
simple counterexamples for such inequalities seems to be an 
indicator of a genuine need for quantum statistics in data anal- 
ysis. 



Introduction 

Until recently, Computer Science was mainly concerned 
with data storage and processing in purpose-built data bases 
and computers. With the advent of the Web and social com- 
putation, the task of finding and understanding information 
arising from local interactions in spontaneously evolving 
computational networks and data repositories has taken cen- 
ter stage. 

As computers evolved from calculators, the key paradigm 
of Computer Science was computation-as-calculation, with 
the Turing Machine construed as a generic calculator, and 
with data processing performed by a small set of local 
operations. As computers got connected into networks, 
and captured a range of social functions, the paradigm 
of computation-as-communication emerged, with data pro- 
cessing performed not only locally, but also through distri- 
bution, merging, and association of data sets through vari- 
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ous communicating processes. Such non-local data process- 
ing has been implemented through markets, elections, and 
many other social mechanisms for a very long time, albeit on 
a smaller scale, with less concrete infrastructure, and with 
more complex computational agents. A new family of its 
implementations is based on a new computational platform, 
which is not any more the Computer, or even its operating 
system, but the Web, and its knowledge systems. 

But while the interfaces of the local computational pro- 
cesses are defined to be the interfaces of the comput- 
ers which perform them, the carriers of computation-as- 
communication do not come with clearly defined interfaces. 
The task of finding and supplying reliable data within a mar- 
ket, or on the Web, or in a social group, carries with it many 
deep problems. Two of them are particularly relevant for this 
work. 

Problem of partial information and indeterminacy 

Data processing in a network is ongoing. On the other hand, 
the data sets are usually incomplete, and information needs 
to be extracted from such incomplete sets. E.g., a task in a 
recommender system is to extrapolate which movies (books, 
music. . . ) will a user like, from a sparse sample of those that 
she had previously rated. In information retrieval, the task 
is to extrapolate which information is relevant for a query, 
from a small set of tokens characterizing the query on one 
hand, and the information on the other hand. 

In the standard model of data analysis, succinctly pre- 
sented e.g. in ( Az ar et al. 20011 1, it is assumed that a ma- 
trix of random variables, containing a complete information 
about the relevant properties of the objects of interest, exists 
out there (in some sort of a Platonic heaven of information), 
and can be sampled. The problem of data analysis is that the 
sampling process is noisy, and partial; more specifically, that 
the distributions of the random variables are distorted by an 
error process, and by an omission process. The task of data 
analysis is to eliminate the effects of these processes, and re- 
construct a good approximation of the original information. 

While mathematically convenient, and computationally 
effective, this model does not seem very realistic. If we 
instantiate it to a recommender system again, then its ba- 
sic assumption becomes that each user has a completely de- 
fined preference distribution, albeit only over the items that 
he has used, and that the recommender system just needs 



to reconstruct this preference distribution. But if we zoom 
in, and ask the user himself, he will often be unable to pre- 
cisely reconstruct his own preference distribution. If we ask 
him to rate some items again, he will often assign different 
ratings. One reason is that information processing is on- 
going, and that the preferences evolve and change. If we 
zoom in even further, we will find that the state of user's 
preferences is usually not completely determined even in a 
completely static model: right after watching a movie, one 
usually needs to toss a "mental coin" to decide whether to 
assign 2 or 3 stars, say, to the performance of an actor; or 
to decide whether to pay more attention, while watching the 
movie, to this or that aspect, music, colors. . . 

While the indeterminacy of information in a network can 
be reduced to an effect of noise, like in the standard model, 
and averaged out, it is interesting to ponder whether view- 
ing this indeterminacy as an essential feature of network 
computation, rather than a bug, may lead to more realistic 
models of information systems. Is the "mental coin", which 
resolves the superposition of the many components of my 
preferences when I need to measure them, akin to a real coin, 
which we all agree is governed by completely deterministic 
laws of classical physics, and its randomness is just the ap- 
pearance of its complex behavior; or is this "mental coin" 
governed by a more fundamental form of randomness, like 
the one that occurs in quantum mechanics, causing the su- 
perposition of many states to collapse under measurement! 

Problem of classification and latent semantics 

The task of conceptualizing data has been formulated in 
many ways. In information retrieval, the central task is to 
determine the relevance of data with respect to a query. In 
recommender systems, the implicit query is always: "What 
will I like, given my past choices and rankings?", and the 
task is to find the relevant recommendations. In order to 
tackle such tasks, one classifies the data on one hand, the 
queries on the other, and aligns the two classifications, in or- 
der to extrapolate the future choices from the past choices. 
— But what are these classifications based on? 

The simplest approach is based on keywords. But even 
classifying a corpus of purely textual documents, viewed as 
bags of words, according to the frequency of the occurrences 
of the relevant keywords, leads to significant problems: pol- 
ysemy, homonymy, synonymy. The problem becomes very 
difficult when it comes to classifying families of non-textual 
objects: images, music, video, film. Only a small part of 
their correlations can be captured by connecting the key- 
words, captions, or other forms of textual annotations. 

Latent semantics correlates data by extracting their intrin- 
sic structure. For instance, the central piece of the origi- 
nal Google search engine, distinguishing it from other sim- 
ilar engines, was that the keyword search was supported 
by PageRank (Page et al. 1998| l, a reputation ranking of the 
Web pages, extracted from their intrinsic hyperlink struc- 
ture. Even for the keyword search, the crucial step was to 
recognize this latent variable (Everitt 1984) extracting rele- 
vance from non-local network structure, rather than from lo- 
cal term occurrence. Such semantical support is even more 
critical for search and retrieval of non-textual information, 



on the Web and in other data spaces. 

Overview of latent semantics 

We consider the case when two types of data assign the 
meaning to each other. 

Pattern matrices 

Latent semantics is generally given as a map 
J x U R 

where 

• J is a set of objects, or items, 

• U is a set of properties, or users, 

• R is a set of values, or ratings. 

This map is conveniently presented as a pattern matrix 
A = (Aj U )j><u- The entry Ai U can be intuitively written 
as a model relation i \= u, especially when R = {0, 1}. In 
general, it can be construed as the degree to which the ob- 
ject i satisfies the property, or the user u. While the ratings 
R usually carry a structure of an ordered rijjH\, the attributes 
U often carry a more general algebraic structure, whereas 
the behaviors of the objects in J may be expressed coalge- 
braically. Clearly, the rig structure of R is just enough to 
support the usual matrix composition. Sometimes, but not 
always, we also assume that R has no nilpotents, so that it 
can be embedded in an ordered field. 



Examples. 



domain 


J 


U 


R 




text analysis 


documents 


terms 


N 


occurrence count 


measurement 


instances 


quantities 


E 


outcome 


user preference 


items 


users 


{0,...,5} 


rating 


topic search 


authorities 


hubs 


N 


hyperlinks 


concept analysys 


objects 


attributes 


{0.1} 


satisfaction 


elections 


candidates 


voters 


{0,..,n} 


preference 


market 


producers 


consumers 




delivery 


digital images 


images 


pixels 


[0,1] 


intensity 



Balancing and normalization 

Notation. For every vector x = (xfc)^ =1 , we define 

• the average (expectation) E(.t) = ~ J2k=i x k 

• the £ 2 -norm ||x|| 2 = y/J2k=i M 2 ' 

• the 4o-norm \\xWoo = Vjfe=x \ x k\- 

Item balancing of a semantics matrix A reduces each of its 
rows A^, corresponding to the item i, to a row vector 
defined 

A° it = A im - E(Ai.) 

The unassigned ratings in Aj. are padded by zeros. 

In an item-balanced matrix records, the difference be- 
tween the items with a higher average rating and the items 

'A rig R = (R,+, -,0, 1) is a "ring without the negatives". 
This means that (R, +, 0) and (R, •, 1) are commutative monoids 
satisfying a(b + c) = ab + ac and 0a = 0. The typical examples 
include natural numbers, non-negative reals, but also distributive 
lattices, which generally do not embed in a ring. 



with a lower average rating is factored out. Only the sat- 
isfaction profile of each item is recorded, over the set of 
users who have assigned it better-than-average, or worse- 
than-average rating. The average and unassigned ratings are 
identified, and both become 0. 

User balancing of a semantics matrix A reduces each of 
its columns A, u , corresponding to the user u, to a column 
vector A® u , with the expected value 0, by setting 



A. 



A% u E(^4,„ 



The unassigned ratings are again padded by zeros. 

In a user-balanced matrix, users' different rating habits, 
that some of them are more generous than others, are fac- 
tored out. Only the satisfaction profile of each user is 
recorded, over the set of all items that she has rated. The 
average and unassigned ratings are identified, both with 0. 

Item normalization of a semantics matrix A factors its rows 
into unit vectors; the user normalization factors its columns 
into unit vectors — by setting 



A, 



\A~f 2 

Am u 



U.uh 

Comment. The purpose of balancing and normalization of 
raw semantic matrices is to factor out the aspects of rating 
that are irrelevant for the intended analysis. Whether a par- 
ticular adjustment is appropriate or not depends on the in- 
tent, and on the available data. E.g., padding the available 
ratings by assigning the average rating to all unrated items 
may be useful in some cases, but it skews the data when the 
sample is small0 In the rest of the paper, we assume that all 
such adjustments have been applied to data as appropriate, 
and we focus on the methods for extracting information from 
them. 

Classification 

Through pattern matrices and latent semantics, the ob- 
jects and the properties lend a meaning to each other. 
The simple method for extracting that meaning is based 
on the general ideas of Principal Component Analysis 
(IJolliffe 19861 1. This method underlies not only the vec- 
tor space based approaches, like Latent Semantics Indexing 
(LSI) (IDeerwester et al. 1 990), or Hypertext Induced Topic 
Search (HITS) ( |Kleinberg 1999) , but also, albei t in a less ob - 
vious way, Formal Concept Analysis (FCA) (IWille 19821 1, 
and some other approaches. The general idea is that the 
latent semantical structures can be obtained by factoring 
the pattern matrix through suitable transformations, required 
to preserve a conceptual distance between the objects, as 
well as between their properties. These distance-preserving 
transformations can be captured under the abstract notion of 
isometry. 



E.g., when only one rating is available from a user, then ex- 
trapolating his average rating to the unrated items simply erases all 
available information. 



Suppose that the rig of values is given with an involutive 
automorphism (— ) : R — * R, called conjugation. If the 
values are the complex numbers, R = C, then of course 
a + ib = a — ib. For general rigs R, conjugation sometimes 
boils down to a — a. In any case, any pattern matrix A = 
(^iu)jxu induces an adjoint matrix A$ — (A^ i )uxj, whose 
entries are defined to be A ui = Ai a . The inner product of 
vectors x, y € R J can now be defined as (x\y) — y$ o x. 

Definitions. An isometry is a map U : A c — >- B such 
that (Ux\Uy) = (x\y) holds for all x, y. Equivalently, this 
means that U^U = id.4. It is a unitary if both U and [/+ are 
isometries. 

An isometric decomposition of an operator B : IA 
consists of isometries V : U c — >■ U and W : J ( — 



such that there is a (necessarily unique) map B : J 
satisfying B = WBV X 



J 

J 
U 





The spectral decomposition B = WBV^ is minimal among 
B's isometric decompositions: 




in the sense that for every isometric decomposition B = 
WBV$, there is an isometric decomposition B = WBV$, 
such that W = WW and V = VV. 
We further also need 

Correlation matrices are the self-adjoint matrices in the 
form M J = AA^ and M u = A* A, i.e. 



^ . ' Ai 
- A- -X 



Examples of classification through isometric 
decomposition 



Given a pattern matrix J x U 

J = 
U = 



R, we set 



R J 
R u 



so that A becomes a linear operator A : U — > J, defined by 
the usual matrix action on the vectors. 



Latent Semantic Indexing. dDeerwester et al. 19901 1 Let 
the rig of values R be the field of real numbers M, with the 
trivial conjugation r = r. This means that J = R J and 
U = R u are real vector spaces. The pattern matrix J x 



The spectral decomposition 



U 



A A 

— s~ R induces the linear operator U — 5- J and the 



adjoint J 



A* 



U is just the transpose. 



The isometric decomposition boils down to the singular 
value decomposition. The isometries V : W c — U and 
W : J' c — >■ J are obtained by the spectral decomposi- 
tion of the symmetric matrices A/ u = A* A and M = AA*. 
Since both decompose through the same rank space, with 
the same spectrum A = {Ai > A2 > . . . > A„}, we 
get a positive diagonal matrix A such that A* A = VAV$ 
and AA$ = WAW t , from which A = WDV^ follows for 
D = VA. 

The eigenspaces of M v and M J can be viewed as pure 
topics captured by the pattern matrix A. The eigenvalues 
correspond to the degree of semantical relevance of each 
topic in the data set from which the pattern matrix was ex- 
tracted. If U are users and J items, then the eigenspaces in 
U can be thought of as tastes, the eigenspaces in J as styles. 
Remarkably, there is a bijective correspondence between the 
two, and the eigenvalues quantify the correlations. As an in- 
stance of the same decomposition, Kleinberg's (1999 ) anal- 
ysis of Hyperlink Induced Topic Search (HITS) yields a sim- 
ilar correspondence between the hubs and the authorities on 
the Web. In all cases, the underlying view is that the infor- 
mation consumers and the information producers, lending 
each other the latent semantics, share a uniform conceptual 
space. An even simpler presentation of that optimistic view 



Formal Concep t Analysis. 

( |Ganter, Stumme, & Wille 2005) 1 Let the rig of values 
R now be the distributive lattice B = (2, V, A, 0, 1), over 
the underlying set 2 = {0,1}, with the negation -1 : B — > B 
as the conjugation 1 = Note that this is now an 

antimorphism of B = (2, V,A,0, 1) with the dual lattice 
B = (2, A, V, 1,0). The space of the objects is thus the 
boolean lattice J — 2 J , ordered by inclusion, whereas 
the space of the properties is the boolean lattice U = 2 , 
ordered by reverse inclusion. 

Given a pattern matrix, which in this case boils down to a 

binary relation J x U — >■ 2, we consider the induced map 

—iA 1 

U — s~ 2 , and derive the monotone maps 

B(X) = {i e J I 3u e X. -^uAi} 
B t (Y) = {ueU\ Vi^Y. uAi} 
which are adjoint to each other in the sense 

B{X)<ZY XCB*(Y) 
and by conjugating yield the Galois connection 
FCnB(I) .$=>. XCB t (^Y) 

B 

2 U CT 2 J 2 J 





U 



is obtained by setting 

U = {X e 2 U I M U (X) = X} 
J = {Y e2 i \M i {Y)=Y} 

where the closure operators Af u = (B$^)o(-nB) and M J = 
(->B) o (B*^,) unfold to 

M U (X) = {u e U I Vi e J. (V« e X. iAv) => iAu} 
M\Y) = {i e J I Vu e U. (Vj e Y. jAu) => iAu} 

Note that M v is obtained by composing the matrices ^B 

and B*^ over the space 2 , where the composition 5 is dual 
to the usual one, i.e. {PoQ) ik = ^(Pij V Q k i). 

It is easy to see that the lattices of closed sets U and J are 
isomorphic, because they are both isomorphic with 

£ = {(X,Y) eV(J x VJ \B{X) = -Y A 

B*(Y) = -nX} 

This is the form in which a concept lattice is usually pre- 
sented ( |Ganter, Stumme, & Wille 2 005). The fact that the 
spectral composition is minimal means that it correlates 
users' strongest tastes, captured in U with items' strongest 
styles, captured in J. 

Remark. While LSI is a standard, well-studied data min- 
ing method, FCA has been less familiar in the data analysis 
communities, although an early proposal of a concept-lattice 
approach can be traced back to the earliest days of the infor- 
mation retrieval research ( Salton 1968), predating both FCA 
and even the standard vector space model. More recently, 
though, the applications of FCA in information retrieval 
have been tested and explained (|Carpineto & Romano 2004 



IPriss 200"6l Poshyvanyk & Marcus 2007 1. The succinct pre 
sentation of LSI and FCA as special cases of the same pat- 
tern, in our abstract model above, points to the fact that the 
Singular Value Decomposition, on which LSI is based, and 
the Galois Connections, that lead to FCA, both subsume un- 
der the abstract structure of isometric decomposition, just 
instantiated to the rig of reals for LSI, and to the boolean 
rig for FCA. The simple structure of isometric decomposi- 
tion, and the corresponding notion of conceptual distance, 
can thus be construed as the basic building block of seman- 
tical classification in data analysis. It turns out that already 
this rudimentary structure leads into quantum statistics. 

Concept lattices are not distributive 

While classical measures are defined over cr-algebras, which 
are distributive (and boolean) as lattices, quantum mea- 
sures are defined over a more general family of algebras, 



which need not be distributive lattices, but only orthomodu- 
lar ( [Meyer 1986[|Meyer 1993[|Redei & Summers 2006D . 

A crucial, frequently made observation, eventually lead- 
ing into quantum statistics, is that the lattices of concepts, 
and of topics, induced by the various forms of latent seman- 
tics, are not distributive. Indeed, since the lattice structure is 
induced by 



x A y 
x V y 



x Dy 
M(xUy) 



the closure operator M often disturbs the distributivity of 
the underlying set-theoretic operations. The observation that 
this non-distributivity of concept lattices lifts to the realm of 
information retrieval is due to van Rijsbergen. For reader's 
convenience, we repeat the intuitive example of xf\ (yVz) ^ 
(x A y) V (x A z) from (van Rijsbergen 2004 p. 36). In a 
taxonomy of animals, take x ="bird", y = "human" and 
z ="lizzard". Then both x A y and x A z are empty, so that 
(x A y) V (x A z) remains empty. On the other hand, y V z 
= "vertebrates", because vertebrates are the smallest class 
including both humans and lizzards. Hence x A (y V z) = 
"birds" is not empty. 

The point is that such phenomena arise from all forms of 
latent semantics. But beyond this point, there are even more 
specific indications of quantum statistics at work. 

Similarity and ranking 

At the core of the vector space model of information re- 
trieval, data mining and other forms of data analysis lies the 
idea that the basic similarity measure, applicable to pairs of 
objects, or of attributes, or to the mixtures thereof, is ex- 
pressible in terms of the inner product of their normalized 
(often also balanced) vectors: 



•At 



s(*,i) = (Aj.\Ai.) = Y^ A ou 

u£U 

${u,v) = (A. U \A. V ) =Y,A lu -A m 

More generally, using the inner product one can also mea- 
sure the similarity of pure topics x and y, viewed as linear 
combinations of the property vectors: 



sm(x,v) 



(x\AiA\y) = (Ax\Ay) 



In the same vein, the ranking of mixed topics, represented by 
the subspaces E of the space of properties, then corresponds 
to the trace operator: 

tr M (x) = (x\A t A\x) ^ (Ax\Ax) 



tr M (£) 



xeBt 



tr M (x) 



Noting that a correlation matrix M — A* A amounts to what 
is in quantum statistics called an observable, we see that 
the ranking measures, already in the standard vector model, 
correspond to quantum measures. If the pattern matrices are 
furthermore normalized as to generate the correlation matri- 
ces with a unit trace, then they correspond to quantum prob- 
ability distributions, or to quantum states. 



Bell's inequality of similarities 

In this final section, we attempt to use the described mea- 
sure of similarity of users' tastes, derived from their past 
ratings of similar items, to predict the probability that they 
will agree in their future ratings. Although based on a sim- 
ple, intuitive view of similarity and agreement, this predic- 
tion turns out to be impossible, as it leads to a contradiction. 
This impossibility result can be viewed as an indicator of a 
quantum statistical correlation, or at least as evidence that 
there is a problem with the straightforward statistical model 
of this simple situation. 

The contradiction arises along the lines of Bell's deriva- 
tion of his notable inequality (Be ll 1964b . More precisely, 
for any pair of users x.y £ U, represented by the unit vec- 
tors x, y : J — ► R, derive from their past ratings of the same 
items, we consider the random variables X, Y : J' — > {0, 1}, 
over a possibly larger set of items. Suppose that X(i) = 1 
means that the user x likes the item i, and that X(i) = 
means that she does not like it. We assume that the proba- 
bility P(X = Y) e [0, 1] that X and Y will agree is propor- 
tional to their past similarity s(x, y) e [—1, 1], modulo the 
rescaling of [—1, 1] to [0, 1]. This induces a constraint on the 
similarities. 

Proposition. Let the past preferences of 'xo,xi,yo,y\ € J 
be given as unit vectors Xq,Xi, yo, yi : U — > R. If the prob- 
ability of their future agreement is determined by rescaling 
the similarity of their past preferences 

P {X = Y) = 1 + S{X > V) 



then their similarities must satisfy the following condition: 

s(x ,yi) +s(xi,yi) + s(xi,j/ ) ~ s,(x Q ,y Q ) < 2 (1) 

This follows from the general fact that the disagreement of 
{0, l}-valued random variables is a distance function. 

Lemma. Any three random variables X,Y, Z : J — ► 
{0, 1} satisfy 

P(X ^ Z) < P(X ^ Y) + P(Y ^ Z) (2) 

Proof. Let Wxy ■ U — > {0, 1} be the random variable 

'1 if X(i)^Y(i) 
\fX{i) = Y(i) 

We claim that 

W XZ < Wxy + W Y z (3) 

Towards the contradiction, suppose that there is j G J 
with Wxz(j) > Wxy(j) + WyzU)- This means that 
Wxz(j) = 1, but Wxvij) = W Y z{j) = 0, and thus 
X(j) ^ Z(j) but X(j) = Y(j) and Y(j) = Z(j) - which 
is clearly impossible. Therefore (O must be true. But since 
P(X i-Y) = E(Wxy), averaging © gives ©. □ 



W XY ii) 



Proof of the Proposition. Since P(X = Y) 



l+s{x,y) 



, It 



follows that P(X ^ Y) = 
© gives {TJ). 



_ l-s{x,y) 
1 



Substituting this into 

□ 



Corollary. The probability of users' future agreement 
P(X = Y) cannot be derived by reseating the past simi- 
larities of their tastes &{x, y), where the similarity measure 
s is defined by the inner product. The reason is that formula 
(0, which would have to be satisfied, does not always hold. 

Proof. The taste vectors xq = (1,0), yo = (— 1,0), x\ 



1 V5 

" 2 ' 2 



for CD. 



and yi = (i, provide a counterexample 

□ 



Interpretation. Why is it not justified to predict future 
agreements from past similarities, both defined in intuitively 
obvious ways? One line of explanation is that the indepen- 
dence assumptions are violated. As usually, the dependen- 
cies can be explained in terms of hidden variables (e.g., off- 
line interactions of the users), or in terms of non-local in- 
teractions. Another line of explanation is that the depen- 
dencies are introduced in the model itself. Intuitively, this 
means that the users, whose agreements are predicted, have 
not been sampled in the same measure space, and that their 
preferences should not be statistically mixed. 

Remark. Rather than derived from similarity, users' se- 
mantical distance can be defined by P(X = Y) = \Ax — 
Ay\x>- A reader familiar with quantum probability theory 
(Meyer 1986; Meyer 1993) will recognize this interaction of 
the Hilbert space £2 and the Banach space i^, which acts 
on it as a von Neumann algebra, as the familiar interface 
between the quantum and the classical probabilities. 

Conclusion and future work 

We have shown that already in the basic, but sufficiently 
abstract models of information retrieval, data mining, and 
other forms of data analysis, a suitable version of Bell's ar- 
gument applies, suggesting that the quantum statistical ap- 
proach may be necessary. 

The simple interpretation of Bell's argument is that the 
quantum statistical predictions refer to non-local interac- 
tions. More subtle interpretations lead into the issues of con- 
textuality (Bell 19871 p. 9). In some cases, of course, both 
the non-local interactions and the contextual dependencies 
arise as a figment of the statistical model, mixing variables 
that cannot be sampled together. Either way, the version of 
the argument presented above suggests simple minded pre- 
diction based on the vector space model of information pro- 
cessing in a network may lead to problems if the locality of 
the interactions is not taken into account. Is it possible that 
genuine entanglement phenomena arise on a network? 

After a moment of thought about this question, one gets a 
strange feeling that quantum probability might in fact be eas- 
ier to comprehend in the realm of network computation, than 
in physicsjj While action at a distance is a highly unintuitive 
phenomenon in physics — Einstein called it "spooky" — in 



network computation it can be reduced to the fact that the in- 
formation may flow not only through the network links, but 
also off the network. This fact is not only intuitively natural, 
in the sense that, say, the data on the Web move not only in 
packets, along the Internet links, but they also get teleported 
from site to site, by people talking to each other, and then 
typing on their keyboards; but it is also information theoret- 
ically robust, in the sense that there are always covert chan- 
nels. In abstract models, they can be represented in terms 
of non-local hidden variables, or in terms of entanglement. 
Either way, the operational content of quantum statistical 
methods will undoubtedly broaden the algorithmic horizons 
of network computation and data analysis, already by an- 
alyzing the meaning of the notable quantum algorithms in 
physics-free implementations. Convenient toolkits for com- 
bining quantum states, and for composing quantum oper- 
ations ( Coecke ~& Pavlovic 20071 1 are likely to acquire new 
roles in latent semantics. On the other hand, the generic no- 
cloning and no-broadcasting theorems (Bar num ef a/. 2006) 
are likely to point to some interesting statistical limitations, 
with a potential impact in security^ 
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